Homework 1

Author

Aziz Al Mezraani

Professional wrestling, while not everyone’s cup of tea, is big business. What started as a carnival act has turned into a global entertainment industry. Netflix recently started showing Monday Night Raw, a program from the biggest North American wrestling company, WWE – this deal is reportedly worth $5 billion. Like any large entity, WWE is not without competition, drama, and scandal.

General Tips

This is very much a step-by-step process. Don’t go crazy trying to get everything done with as few lines as possible. Read the documentation for the AlphaVantage api! Carefully explore the pages from cagematch. There isn’t a need to get too fancy with anything here – just go with simple function and all should be good. Don’t print comments, but use normal text for explanations.

Step 1

In the calls folder, you’ll find 4 text files – these are transcripts from quarterly earnings calls. Read those files in (glob.glob will be very helpful here), with appropriate column names for ticker, quarter, and year columns; this should be done within a single function. Perform any data cleaning that you find necessary.

import glob
import pandas as pd
import re
import os

calls = glob.glob("C:/Users/azizm/Documents/Uni/Spring 2025/UDA/calls/*.txt")

data_frames = []

for file in calls:
    df = pd.read_table(file)
    
    filename = os.path.basename(file)  

    ticker_match = re.search(r'([a-zA-Z]+)_q', filename)
    df['ticker'] = ticker_match.group(1) 

    quarter_match = re.search(r'q[1-4]', filename)
    df['quarter'] = quarter_match.group(0) 

    year_match = re.search(r'\d{4}', filename)
    df['year'] = year_match.group(0) 
    
    data_frames.append(df)

dataframe = pd.concat(data_frames)
dataframe['ticker'] = dataframe['ticker'].str.upper()

print(dataframe)
                                  Company Participants ticker quarter  year
0                                     James Marsh - IR    EDR      q3  2023
1                                    Ari Emanuel - CEO    EDR      q3  2023
2                                   Jason Lublin - CFO    EDR      q3  2023
3                     Mark Shapiro - President and COO    EDR      q3  2023
4                         Conference Call Participants    EDR      q3  2023
..                                                 ...    ...     ...   ...
133  So in reverse order, ad share deal with Twitch...    WWE      q2  2023
134                                        Seth Zaslow    WWE      q2  2023
135  Well, thank you, everyone, for joining us on t...    WWE      q2  2023
136                                           Operator    WWE      q2  2023
137  This concludes today's call. Thank you again f...    WWE      q2  2023

[701 rows x 4 columns]

Step 2

Use the AlphaVantage api to get daily stock prices for WWE and related tickers for the last 5 years – pay attention to your data. You cannot use any AlphaVantage packages (i.e., you can only use requests to grab the data). Tell me about the general trend that you are seeing. I don’t care which viz package you use, but plotly is solid and plotnine is good for ggplot2 users.

import requests
import pandas as pd
import plotly.graph_objects as go
from datetime import datetime
import plotly.io as pio

API_KEY = 'XDEQM5KH5391QDXM'
tickers = ['WWE', 'TKO', 'EDR']
five_years_ago = datetime.now() - pd.DateOffset(years=5)

fig = go.Figure()

url_wwe = f'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=WWE&outputsize=full&apikey={API_KEY}'
data_wwe = requests.get(url_wwe).json()
df_wwe = pd.DataFrame.from_dict(data_wwe['Time Series (Daily)'], orient='index', dtype=float)
df_wwe.index = pd.to_datetime(df_wwe.index)
df_wwe = df_wwe[df_wwe.index >= five_years_ago]
fig.add_trace(go.Scatter(x=df_wwe.index, y=df_wwe['1. open'], mode='lines', name='WWE Stock'))

url_tko = f'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=TKO&outputsize=full&apikey={API_KEY}'
data_tko = requests.get(url_tko).json()
df_tko = pd.DataFrame.from_dict(data_tko['Time Series (Daily)'], orient='index', dtype=float)
df_tko.index = pd.to_datetime(df_tko.index)
df_tko = df_tko[df_tko.index >= five_years_ago]
fig.add_trace(go.Scatter(x=df_tko.index, y=df_tko['1. open'], mode='lines', name='TKO Stock'))

url_edr = f'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=EDR&outputsize=full&apikey={API_KEY}'
data_edr = requests.get(url_edr).json()
df_edr = pd.DataFrame.from_dict(data_edr['Time Series (Daily)'], orient='index', dtype=float)
df_edr.index = pd.to_datetime(df_edr.index)
df_edr = df_edr[df_edr.index >= five_years_ago]
fig.add_trace(go.Scatter(x=df_edr.index, y=df_edr['1. open'], mode='lines', name='EDR Stock'))

pio.show(fig)

The general trend is a positive one. The drop occured when WWE got boight by TKO, which it bounced back directly from. In the last year there seems to be exponential growth. This is after Paul Leveque (Triple H) took over the company totally from Vince McMahon.

Step 3

Just like every other nerdy hobby, professional wrestling draws dedicated fans. Wrestling fans often go to cagematch.net to leave reviews for matches, shows, and wrestlers. The following link contains the top 100 matches on cagematch: https://www.cagematch.net/?id=111&view=statistics

  • What is the correlation between WON ratings and cagematch ratings?

** Which wrestler has the most matches in the top 100?

*** Which promotion has the most matches in the top 100?

**** What is each promotion’s average WON rating?

***** Select any single match and get the comments and ratings for that match into a data frame.

from bs4 import BeautifulSoup

link = 'https://www.cagematch.net/?id=111&view=statistics'

hot100_req = requests.get(link)

hot100_soup = BeautifulSoup(hot100_req.content, 'html.parser')

numb = list(range(100))

df_matches = []

for i in numb:
    promotion = hot100_soup.select('.ImagePromotionLogoMini')[i]['title']
    match = hot100_soup.select('.TCol a[href*="111"]')[i].text
    WON_rating = hot100_soup.select('.starRating')[i].text
    rating = hot100_soup.select('.Rating.Color9')[i].text

    df_matches.append({
        "Promotion": promotion,
        "Match": match,
        "WON Rating": WON_rating,
        "Rating": rating
    })

df_matches = pd.DataFrame(df_matches)

df_matches['Rating'] = df_matches['Rating'].astype(float)

df_matches['WON Rating'] = df_matches['WON Rating'].apply(lambda star_string: 
    star_string.count('*') + 
    (0.25 if '1/4' in star_string else 
      0.5 if '1/2' in star_string else 
     0.75 if '3/4' in star_string else 0))

df_matches['WON Rating'] = df_matches['WON Rating'].replace(0, pd.NA)

print(df_matches)
                  Promotion  \
0   New Japan Pro Wrestling   
1        Pro Wrestling NOAH   
2   New Japan Pro Wrestling   
3   New Japan Pro Wrestling   
4   New Japan Pro Wrestling   
..                      ...   
95      All Elite Wrestling   
96  All Japan Pro Wrestling   
97            Ring Of Honor   
98      All Elite Wrestling   
99  New Japan Pro Wrestling   

                                                Match WON Rating  Rating  
0                     Kazuchika Okada vs. Kenny Omega        6.0    9.81  
1                  Kenta Kobashi vs. Mitsuharu Misawa        5.0    9.80  
2               Katsuyori Shibata vs. Kazuchika Okada        5.0    9.78  
3                        Kenny Omega vs. Will Ospreay       6.25    9.76  
4                     Kazuchika Okada vs. Kenny Omega        7.0    9.76  
..                                                ...        ...     ...  
95  Cash Wheeler & Dax Harwood vs. Jay White & Jui...       5.25    9.47  
96                   Kenta Kobashi vs. Steve Williams       4.75    9.47  
97  CIMA, Masato Yoshino & Naruki Doi vs. Dragon K...        5.0    9.46  
98                Konosuke Takeshita vs. Will Ospreay       5.75    9.46  
99                  Hiroshi Tanahashi vs. Kenny Omega       5.75    9.46  

[100 rows x 4 columns]

q1: What is the correlation between WON ratings and cagematch ratings?

import plotly.express as px
import plotly.io as pio

filtered_df = df_matches.dropna(subset=['WON Rating'])

correlation = filtered_df['WON Rating'].corr(filtered_df['Rating'])
print(f"Correlation between WON ratings and cagematch ratings: {correlation}")

fig1 = px.scatter(filtered_df, x='WON Rating', y='Rating', title='Scatter Plot of WON Ratings vs. Cagematch Ratings')
fig1.update_layout(xaxis_title='WON Rating', yaxis_title='Cagematch Rating')

pio.show(fig1)
Correlation between WON ratings and cagematch ratings: 0.3142055145382091

Answer: The correlation between WON ratings and cagematch ratings is 0.31, which indicates a slightly positive correlation between the two ratings.

q2: Which wrestler has the most matches in the top 100?

name_counts = {}

for match in df_matches['Match']:
    names = re.findall(r'[A-Z][a-z]* [A-Z][a-z]*', match)
    for name in names:
        if name in name_counts:
            name_counts[name] += 1
        else:
            name_counts[name] = 1


name_counts_df = pd.DataFrame(list(name_counts.items()), columns=['Name', 'Count'])

name_counts_df = name_counts_df.sort_values(by='Count', ascending=False)

print(name_counts_df)
                  Name  Count
1          Kenny Omega     16
2        Kenta Kobashi     15
0      Kazuchika Okada     14
3     Mitsuharu Misawa     11
11     Bryan Danielson     11
..                 ...    ...
91          Naruki Doi      1
92          Dragon Kid      1
93     Genki Horiguchi      1
94           Ryo Saito      1
95  Konosuke Takeshita      1

[96 rows x 2 columns]

Answer:Kenny Omega has the most matches in the top 100 with 16 matches.

q3: Which promotion has the most matches in the top 100?

promotion_counts = df_matches['Promotion'].value_counts().sort_values(ascending=False)
print(promotion_counts)
Promotion
New Japan Pro Wrestling                 35
World Wrestling Entertainment           14
All Japan Pro Wrestling                 12
All Elite Wrestling                     12
Ring Of Honor                            8
Pro Wrestling NOAH                       6
All Japan Women's Pro-Wrestling          4
World Wonder Ring Stardom                2
Total Nonstop Action Wrestling           1
DDT Pro Wrestling                        1
GAEA Japan                               1
Lucha Underground                        1
Japanese Women Pro-Wrestling Project     1
World Championship Wrestling             1
JTO                                      1
Name: count, dtype: int64

Answer: New Japan Pro Wrestling has the most matches in the top 100 with 35 matches.

q4: What is each promotion’s average WON rating?

average_won_ratings = df_matches.groupby('Promotion')['WON Rating'].mean().sort_values(ascending=False)

print(average_won_ratings)
Promotion
All Elite Wrestling                       5.5625
World Wonder Ring Stardom                    5.5
New Japan Pro Wrestling                 5.392857
Japanese Women Pro-Wrestling Project         5.0
Total Nonstop Action Wrestling               5.0
World Championship Wrestling                 5.0
All Japan Pro Wrestling                 4.979167
Ring Of Honor                           4.928571
All Japan Women's Pro-Wrestling         4.916667
World Wrestling Entertainment           4.892857
Pro Wrestling NOAH                      4.791667
JTO                                         4.75
DDT Pro Wrestling                            NaN
GAEA Japan                                   NaN
Lucha Underground                            NaN
Name: WON Rating, dtype: object

q5: Select any single match and get the comments and ratings for that match into a data frame.

link1 = 'https://www.cagematch.net/?id=111&nr=4898&page=99'

match_req = requests.get(link1)

match_soup = BeautifulSoup(match_req.content, 'html.parser')

numb_comments = match_soup.select('.CommentHeader')

comments_df = []

for com in range(len(numb_comments)):
    comments = {'User': match_soup.select('.CommentHeader')[com].text,
    'Comment': match_soup.select('.CommentContents')[com].text}
    comments_df.append(comments)

comments_df = pd.DataFrame(comments_df)

comments_df['Date'] = comments_df['User'].str.extract(r'(\d{2}\.\d{2}\.\d{4})')

comments_df['User'] = comments_df['User'].str.replace(r' wrote.*', '', regex=True)

comments_df['Rating']=comments_df['Comment'].str.extract(r'(\[\d+\.\d+\]|\[\d\.\d\])')
comments_df['Rating'] = comments_df['Rating'].str.replace(r'\[(\d+\.\d+|\d\.\d)\]', r'\1', regex=True)

comments_df['Comment'] = comments_df['Comment'].str.replace(r'\[\d+\.\d+\]|\[\d\.\d\]', '', regex=True)

def remove_first_and_last_quotes(text):
    text = re.sub(r'"', '', text, count=1)
    if text.endswith('"'):
        text = text[:-1]
    
    return text
comments_df['Comment'] = comments_df['Comment'].apply(remove_first_and_last_quotes)

print(comments_df)
                           User  \
0                 RealTeflonDon   
1                        cactus   
2                        TH0810   
3                   WilsonDrove   
4               archerinfection   
..                          ...   
344                 hatebreeder   
345                  GamePrince   
346                  The Denniz   
347  The Rated R Superstar EDGE   
348                        KASH   

                                               Comment        Date Rating  
0     Time stopped for me when this match took plac...  31.01.2025   10.0  
1     An absolute gem and a kick up the arse to WWE...  31.01.2025   10.0  
2     There are very few matches that get every asp...  30.01.2025   10.0  
3     For almost 11 years this was the last WWE mai...  21.01.2025   10.0  
4     One of the greatest matches of all time, one ...  16.01.2025   10.0  
..                                                 ...         ...    ...  
344   Freakin' Match of the Year! Das Match war an ...  18.07.2011   10.0  
345   Eins der besten Singles-Matches von John Cena...  18.07.2011    9.0  
346   Unglaubliche Crowd. Gänsehaut put. Match of t...  18.07.2011   10.0  
347   Sehr sehr gutes Match und eindeutig Match of ...  18.07.2011   10.0  
348   ****1/2 - Match of the Year 2011. Ich ging mi...  18.07.2011   10.0  

[349 rows x 4 columns]

Step 4

You can’t have matches without wrestlers. The following link contains the top 100 wrestlers, according to cagematch: https://www.cagematch.net/?id=2&view=statistics

*** Of the top 100, who has wrestled the most matches?

***** Of the top 100, which wrestler has the best win/loss?

link = 'https://www.cagematch.net/?id=2&view=statistics'


hot100_req = requests.get(link)


hot100_soup = BeautifulSoup(hot100_req.content, 'html.parser')

all_links = hot100_soup.select('.TCol a')

filtered_links = [link for link in all_links if link['href'].count('&') == 2]

df_wrestlers = []

for i in filtered_links:
    wrestler_name = i.text.strip()
    wrestler_href = i['href']
    
    ID = re.search(r'nr=(\d+)', wrestler_href).group(1)
    
    df_wrestlers.append({
        "Name": wrestler_name,
        "ID": ID,
    })

df_wrestlers = pd.DataFrame(df_wrestlers)

match_stats = []

for i in df_wrestlers['ID']:
    link = f'https://www.cagematch.net/?id=2&nr={i}&page=22'    
    hot100_req = requests.get(link) 
    hot100_soup = BeautifulSoup(hot100_req.content, 'html.parser')
    wrestler_stats = hot100_soup.select('.InformationBoxContents')
    matches = wrestler_stats[0].text
    wins = wrestler_stats[1].text
    losses = wrestler_stats[2].text
    draws = wrestler_stats[3].text
    match_stats.append({"Matches" : matches,"Wins" : wins, "Losses" : losses, "Draws" : draws})

match_stats = pd.DataFrame(match_stats)

extract_number = lambda x: int(re.search(r'\d+', x).group()) 

match_stats['Matches'] = match_stats['Matches'].apply(extract_number)
match_stats['Wins'] = match_stats['Wins'].apply(extract_number)
match_stats['Losses'] = match_stats['Losses'].apply(extract_number)
match_stats['Draws'] = match_stats['Draws'].apply(extract_number)

df_wrestlers_stats = pd.concat([df_wrestlers, match_stats], axis=1)

q1: Of the top 100, who has wrestled the most matches?

most_matches = df_wrestlers_stats.sort_values(by='Matches', ascending=False)
print(most_matches)
                    Name    ID  Matches  Wins  Losses  Draws
95             Ric Flair  1091     4999  2553    1971    475
11  Jushin Thunder Liger   455     4370  2376    1871    123
14             Lou Thesz   930     4340  3204     339    797
60     Masaaki Mochizuki  2899     4183  2310    1752    121
84         Antonio Inoki  1096     3688  2929     459    300
..                   ...   ...      ...   ...     ...    ...
59              Jim Ross   439        9     5       3      1
34         Howard Finkel  2279        8     4       3      1
25         Gene Okerlund  1383        4     4       0      0
65           Cesar Duran  9189        1     0       1      0
22         Hiroyuki Unno  7522        0     0       0      0

[100 rows x 6 columns]

Answer:Ric Flair has wrestled the most matches in the top 100 with 4999 matches.

q2: Of the top 100, which wrestler has the best win/loss?

df_wrestlers_stats['Win/Loss'] = df_wrestlers_stats['Wins'] / df_wrestlers_stats['Losses']

win_loss = df_wrestlers_stats.sort_values(by='Win/Loss', ascending=False)
print(win_loss)
                Name    ID  Matches  Wins  Losses  Draws  Win/Loss
25     Gene Okerlund  1383        4     4       0      0       inf
14         Lou Thesz   930     4340  3204     339    797  9.451327
84     Antonio Inoki  1096     3688  2929     459    300  6.381264
62  Bruno Sammartino   243     2047  1546     276    225  5.601449
29        Karl Gotch   946      553   370      94     89  3.936170
..               ...   ...      ...   ...     ...    ...       ...
87         Sami Zayn  1523     1772   795     947     30  0.839493
23      Bobby Heenan   770     1033   265     717     51  0.369596
17       Paul Heyman   664       65     6      55      4  0.109091
65       Cesar Duran  9189        1     0       1      0  0.000000
22     Hiroyuki Unno  7522        0     0       0      0       NaN

[100 rows x 7 columns]

Answer: The wrestler with the best win/loss is technically Gene Okerlund with a ratio of infinity since he has never lost a macth. However, he onlt fought 4 matcches. The best win/loss ratio other than him is Lou Thesz, who has wrestled in 4340 matches, with a win/loss ratio of 9.45.

Step 5

With all of this work out of the way, we can start getting down to strategy.

First, what talent should WWE pursue? Advise carefully.

Answer: WWE should pursue Kenny Omega (from AEW), who has the most matches in the top 100 and has a strong following among wrestling fans. He is english speaking and is extremely consistent in both the ring and on the mic. The only issue is that he is one of the best so it could be a bit expensive or hard to land him. Maybe with the move to Netflix they can reach an agreement. It is also worth oting that TNA and WWE have a deal so there is no need to pursue TNA talent as they can interchange between the 2 companies.

Second, reconcile what you found in steps 3 and 4 with Netflix’s relationship with WWE. Use the data from the following page to help make your case: https://wrestlenomics.com/tv-ratings/

Answer: The viewership skyrocketed after the premier on Netflix, especially on Raw. it has went down sine but still averages a milliion more than before. It is still a new relationship but it has been a very good partnership so far. It has also helped WWE expand globally. The good thing is that CM Punk and John Cernna are active with the company currently who have one of the top 100 matches. Also the storylines in WWE help push it further than the other promotions. Lastly, after the royal rumble many stars returned to WWE. One of them is Charlotte Flair who is the daughter of one of the top 100 wrestlers who has the most matches, Ric Flair. The starpower of the name also brings eyes in, especially as she also has an illustrious career

Third, do you have any further recommendations for WWE?

Answer: WWE should continue their focus on storylines as it creates a returning customer base. They should also look into how to bolster WWE Smackdown as since Raws deal with Netflix, there has been a decrease in viewership for Smackdown. They are already trying by mixing the talent all over the promotions within the company, but they can try and make more compelling storylines based on Smackdown (which I believe they are working towards, especially with Jacob Fatu slowly becoming a singles star).

import webbrowser

link_fck = "https://foass.1001010.com/linus/Seth/Aziz"
webbrowser.open(link_fck)
link_fck
'https://foass.1001010.com/linus/Seth/Aziz'